Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Missing value attention clustering algorithm based on latent factor model in subspace
Xiaofei WANG, Shengli BAO, Jionghuan CHEN
Journal of Computer Applications    2023, 43 (12): 3772-3778.   DOI: 10.11772/j.issn.1001-9081.2022121838
Abstract174)   HTML0)    PDF (1364KB)(63)       Save

To solve the problems that traditional clustering algorithms are difficult to measure the sample similarity and have poor quality of filled data in the process of filling missing samples, a missing value attention clustering algorithm based on Latent Factor Model (LFM) in subspace was proposed. First, LFM was used to map the original data space to a low dimensional subspace to reduce the sparsity of samples. Then, the attention weight graph between different features was constructed by decomposing the feature matrix obtained from the original space, and the similarity calculation method between subspace samples was optimized to make the calculation of sample similarity more accurate and more generalized. Finally, to reduce the high time complexity in the process of sample similarity calculation, a multi-pointer attention weight graph was designed for optimization. The algorithm was tested on four proportional random missing datasets. On the Hand-digits dataset, compared with the KISC (K-nearest neighbors Interpolation Subspace Clustering) algorithm for high-dimensional feature missing data, when the missing data was 10%, the Accuracy (ACC) of the proposed algorithm was improved by 2.33 percentage points and the Normalized Mutual Information (NMI) was improved by 2.77 percentage points; when the missing data was 20%, the ACC of the proposed algorithm was improved by 0.39 percentage points, and the NMI was improved by 1.33 percentage points, which verified the effectiveness of the proposed algorithm.

Table and Figures | Reference | Related Articles | Metrics
Genotype imputation algorithm fusing convolution and self-attention mechanism
Jionghuan CHEN, Shengli BAO, Xiaofei WANG, Ruofan LI
Journal of Computer Applications    2023, 43 (11): 3534-3539.   DOI: 10.11772/j.issn.1001-9081.2022111756
Abstract214)   HTML5)    PDF (1678KB)(76)       Save

Genotype imputation can compensate for the missing due to technical limitations by estimating the sample regions that are not covered in gene sequencing data with imputation, but the existing deep learning-based imputation methods cannot effectively capture the linkage among complete sequence loci, resulting in low overall imputation accuracy and high dispersion of batch sequence imputation accuracy. Therefore, FCSA (Fusing Convolution and Self-Attention), an imputation method that fuses convolution and self-attention mechanism, was proposed to address the above problems, and two fusion modules were used to form encoder and decoder to construct network model. In the encoder fusion module, a self-attention layer was used to obtain the correlation among complete sequence loci, and the local features were extracted through the convolutional layer after fusing the correlation to global loci. In the decoder fusion module, the local features of the encoded low-dimensional vector were reconstructed by convolution, and the complete sequence was modeled and fused by self-attention layer. The genetic data of multiple species of animals were used for model training, and the comparison and validation were carried out on Dog, Pig and Chicken datasets. The results show that compared to SCDA (Sparse Convolutional Denoising Autoencoders), AGIC (Autoencoder Genome Imputation and Compression) and U-net, FCSA achieves the highest average imputation accuracy at 10%, 20% and 30% missing rate. Ablation experimental results also show that the design of the two fusion modules is effective in improving the accuracy of genotype imputation.

Table and Figures | Reference | Related Articles | Metrics